Literature Review

From my introduction page we know that global warming and greenhouse effects have been a really concerning issues in recent years. Greenhouse effect is caused by the global radiative forcing as explained in the introduction page (Chandler Explained: Radiative forcing). What causes the global radiative forcing is the increasing amount of greenhouse gasses such as carbon dioxide, methane, nitrous oxide, and fluorinated gases. Carbon dioxide is the most predominant gas among the greenhouse gases (National Geographic Greenhouse effect).

The pie charts from EIA show U.S. carbon dioxide emissions comes from three sources which is petroleum 46%, natural gas 34%, and coal 21%. Those are all energy sources which takes up to 66% of the U.S. total CO2 emissions (U.S. Energy Information Administration - EIA - independent statistics and analysis). These energy sources also take up to 73.2% of the total greenhouse gas emissions (Ritchie et al. Emissions by sector). From the pie chart, we also know that there’s a 21% of the U.S. energy consumption is nonfossil which is nuclear and renewable energy (U.S. Energy Information Administration - EIA - independent statistics and analysis).

One of the project goals is trying to find out as total U.S. energy consumption are changing from traditional sources such as petroleum, natural gas, and coal to clean energies like nuclear and renewable energy, if the total U.S. CO2 emissions decreases and if we are able to predict future CO2 emissions based on energy sources variables. The total monthly petroleum, natural gas, coal consumption, and renewable energy consumption datasets were collected and shown in the previous tabs, and they will be treated as variables in regards to CO2 emissions in the following time series models.

ARMAX/ARIMAX/SAIMAX

The names ARMAX and ARIMAX come as extensions of the ARMA and ARIMA respectively. The X added to the end stands for “exogenous”. In other words, it suggests adding a separate different outside variable to help measure our endogenous variable.

ARMAX - an AutoRegressive-Moving Average with eXogenous terms (ARMAX) model. Unlike the autoregressive with exogenous terms (ARX) model, the system structure of an ARMAX model includes the stochastic dynamics. ARMAX models are useful when you have dominating disturbances that enter early in the process, such as at the input. For example, a wind gust affecting an aircraft is a dominating disturbance early in the process. The ARMAX model has more flexibility than the ARX model in handling models that contain disturbances.

ARIMAX - An Autoregressive Integrated Moving Average with Explanatory Variable (ARIMAX) model.

The names ARMAX and ARIMAX come as extensions of the ARMA and ARIMA respectively. The X added to the end stands for “exogenous”. In other words, it suggests adding a separate different outside variable to help measure our endogenous variable.

SARIMAX - Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors model

Modeling U.S. Total CO2 Emissions Data - ARIMAX

Plotting the Data

Code
emissions_df<-read.csv("data/df_total_monthly_CO2_emissions.csv", skip=1, row.names = 1)
renewable_df <- read.csv("data/df_total_monthly_renewable_consumption.csv", skip=1, row.names = 1)
naturalgas_df <- read.csv("data/df_total_monthly_natural_gas_consumption.csv", skip=1, row.names = 1)
coal_df <- read.csv("data/df_total_monthly_coal_consumption.csv", row.names = 1)
petroleum_df <- read.csv("data/df_total_monthly_df_monthly_petroleum_consumption.csv", skip=1, row.names = 1)

dd <- data.frame(emissions_df$X.1,emissions_df$sum,renewable_df$sum,petroleum_df$sum,naturalgas_df$sum,coal_df$value)
#dd<-dd[,c(1,2,4,6)]
colnames(dd)<-c("DATE","CO2 Emissions","Renewable","Petroleum","Natural Gas","Coal")
knitr::kable(head(dd))
DATE CO2 Emissions Renewable Petroleum Natural Gas Coal
1973-01 565.577 403.982 3185.269 2395.856 871.911
1973-02 514.236 360.901 2941.986 2169.382 692.252
1973-03 506.524 400.161 2942.485 2056.331 676.722
1973-04 461.479 380.470 2621.315 1872.777 785.359
1973-05 474.085 392.142 2836.568 1764.857 999.503
1973-06 469.615 377.232 2731.435 1566.171 1021.716
Code
dd.ts<-ts(dd,star=decimal_date(as.Date("1973-01-01",format = "%Y-%m-%d")),frequency = 12)

autoplotly(dd.ts[,c(2:6)], facets=TRUE) +
  xlab("Year") + ylab("") +
  ggtitle("Variables influencing CO2 Emissions in USA")

As stated in literature review, we are interested in the impacts of U.S. major energy sources on U.S. total CO2 emissions. The energy data were collected and have been shown in the data sources tab. We further used these data and build models and predictions in ARMA/ARIMA/SARIMA Models tab. The unit of CO2 emissions is million metric tons of carbon dioxide, the unit of renewable energy, petroleum, natural gas, and coal is Trillion Btu. The time series starts from Jan 1973 and ends in yet includes Dec 2022.

Fitting SARIMAX model using auto.arima()

Using auto.arima() function here to fit the ARIMAX model. Here we are trying to predict CO2 Emissions using Renewable Energy, Petroleum, Natural gas, and Coal. All variables are time series and the exogenous variables in this case are Renewable Energy, Petroleum, Natural gas, and Coal.

Code
# "CO2 Emissions","Renewable","Petroleum","Natural Gas","Coal"
xreg <- cbind(Rnwbl = dd.ts[, "Renewable"],
              Ptlm = dd.ts[, "Petroleum"],
              Ntlgs = dd.ts[, "Natural Gas"],
              Cl = dd.ts[, "Coal"])

fit <- auto.arima(dd.ts[, "CO2 Emissions"], xreg = xreg, seasonal= TRUE)
summary(fit)
Series: dd.ts[, "CO2 Emissions"] 
Regression with ARIMA(4,0,0)(0,1,1)[12] errors 

Coefficients:
         ar1     ar2     ar3     ar4     sma1    Rnwbl    Ptlm   Ntlgs       Cl
      0.7384  0.0817  0.0343  0.1269  -0.7452  -0.0017  0.0728  0.1218  -0.0065
s.e.  0.0417  0.0510  0.0512  0.0412   0.0287   0.0166  0.0045  0.0040   0.0069

sigma^2 = 85.53:  log likelihood = -2143.09
AIC=4306.19   AICc=4306.57   BIC=4349.95

Training set error measures:
                     ME     RMSE      MAE         MPE    MAPE      MASE
Training set -0.1968328 9.084863 6.881898 -0.03815555 1.18767 0.3113222
                     ACF1
Training set -0.003652219
Code
checkresiduals(fit)


    Ljung-Box test

data:  Residuals from Regression with ARIMA(4,0,0)(0,1,1)[12] errors
Q* = 33.387, df = 19, p-value = 0.02168

Model df: 5.   Total lags used: 24

From the results above, we know that this is a regression model with SARIMA(4,0,0)(0,1,1)[12].

Fitting the model manually - Regression with ARMA Errors

Here we first have to fit the linear regression model predicting CO2 Emissions using Renewable Energy, Petroleum, Natural gas, and Coal.

Then for the residuals, we will fit an ARIMA/SARIMA model.

First fit the linear model

Code
dd$`CO2 Emissions` <- ts(dd$`CO2 Emissions`, star=decimal_date(as.Date("1973-01-01",format = "%Y-%m-%d")),frequency = 12)

dd$Renewable <- ts(dd$Renewable,star=decimal_date(as.Date("1973-01-01",format = "%Y-%m-%d")),frequency = 12)

dd$Petroleum <-ts(dd$Petroleum,star=decimal_date(as.Date("1973-01-01",format = "%Y-%m-%d")),frequency = 12)

dd$`Natural Gas` <-ts(dd$`Natural Gas`,star=decimal_date(as.Date("1973-01-01",format = "%Y-%m-%d")),frequency = 12)

dd$Coal <-ts(dd$Coal,star=decimal_date(as.Date("1973-01-01",format = "%Y-%m-%d")),frequency = 12)
# First fit the linear model
fit.reg <- lm(`CO2 Emissions` ~ Renewable + Petroleum + `Natural Gas` + Coal, data=dd)
summary(fit.reg)

Call:
lm(formula = `CO2 Emissions` ~ Renewable + Petroleum + `Natural Gas` + 
    Coal, data = dd)

Residuals:
     Min       1Q   Median       3Q      Max 
-107.596  -27.274   -1.138   24.142  126.391 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -2.015e+02  2.092e+01  -9.630  < 2e-16 ***
Renewable     -1.098e-01  1.316e-02  -8.348 4.88e-16 ***
Petroleum      1.957e-01  6.690e-03  29.252  < 2e-16 ***
`Natural Gas`  5.652e-02  4.124e-03  13.705  < 2e-16 ***
Coal           1.088e-01  4.877e-03  22.308  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 39.52 on 595 degrees of freedom
Multiple R-squared:  0.7462,    Adjusted R-squared:  0.7445 
F-statistic: 437.3 on 4 and 595 DF,  p-value: < 2.2e-16

Linear Model Residuals

Code
res.fit <- ts(residuals(fit.reg), star=decimal_date(as.Date("1973-01-01",format = "%Y-%m-%d")),frequency = 12)


ggtsdisplay(res.fit)

Code
res.fit |> diff() |> ggtsdisplay()

Code
res.fit |> diff() |> diff(lag=12) |> ggtsdisplay()

Stationarity and Seasonality Test

Code
res.fit |> diff() |> diff(lag=12) |> adf.test()
Warning in adf.test(diff(diff(res.fit), lag = 12)): p-value smaller than
printed p-value

    Augmented Dickey-Fuller Test

data:  diff(diff(res.fit), lag = 12)
Dickey-Fuller = -10.178, Lag order = 8, p-value = 0.01
alternative hypothesis: stationary
Code
diff.res.fit <- res.fit |> diff() |> diff(lag=12) 
isSeasonal(diff.res.fit, test = "combined", freq = NA)
[1] FALSE

After first differencing and seasonality differencing, the data is proved to be stationary. Now let’s find the model parameters of the time series linear regression residual data.

Find the Model Parameters

Code
#q=1,2, Q=1,2 , p=1,2, P=0,1,2,3
#write a funtion
SARIMA.c=function(p1,p2,q1,q2,P1,P2,Q1,Q2,data){
  
  temp=c()
  d=1
  D=1
  s=12
  
  i=1
  temp= data.frame()
  ls=matrix(rep(NA,9*200),nrow=200)
  
  
  for (p in p1:p2)
  {
    for(q in q1:q2)
    {
      for(P in P1:P2)
      {
        for(Q in Q1:Q2)
        {
       
          if(p+q+P+D+Q<=8)
          {
            
            model<- Arima(data,order=c(p-1,d,q-1),seasonal=c(P-1,D,Q-1))
            ls[i,]= c(p-1,d,q-1,P-1,D,Q-1,model$aic,model$bic,model$aicc)
            i=i+1
            #print(i)
            
          
        }
      }
    }
    
  }
  
  }
  temp= as.data.frame(ls)
  names(temp)= c("p","d","q","P","D","Q","AIC","BIC","AICc")
  
  temp
  
}
Code
#q=1,2, Q=1,2 , p=0,1,2, P=0,1,2,3 

output=SARIMA.c(p1=1,p2=3,q1=1,q2=3,P1=1,P2=3,Q1=1,Q2=3,data=res.fit) |>
  drop_na()

minaic <- output[which.min(output$AIC), ]
minbic <- output[which.min(output$BIC), ]
Code
knitr::kable(minaic)
p d q P D Q AIC BIC AICc
25 1 1 1 0 1 1 4877.404 4894.904 4877.473
Code
knitr::kable(minbic)
p d q P D Q AIC BIC AICc
25 1 1 1 0 1 1 4877.404 4894.904 4877.473
Code
auto.arima(res.fit)
Series: res.fit 
ARIMA(2,0,1)(2,1,1)[12] 

Coefficients:
        ar1      ar2      ma1    sar1    sar2     sma1
      1.392  -0.4079  -0.7879  0.0674  -0.072  -0.8284
s.e.  0.079   0.0732   0.0580  0.0506   0.049   0.0304

sigma^2 = 226.2:  log likelihood = -2432.01
AIC=4878.03   AICc=4878.22   BIC=4908.67

Best model from the output is SARIMA(1,1,1)x(0,1,1)[12]. auto.arima() suggested SARIMA(2,0,1)(2,1,1)[12]


Model diagnostics

From model fitting, we generated 1 model, SARIMA((1,1,1)x(0,1,1)[12]. auto.arima() generated SARIMA(2,0,1) x (2,1,1)[12]. It looks like SARIMA(1,1,1)x(0,1,1)[12] has the lower AIC, BIC, and AICc. Now let’s do two model diagnoses to analyze the result and find the better model to do forecast later.

SARIMA(2,0,1)(2,1,1)[12]

Code
set.seed(1234)
fit1 <- Arima(res.fit, order=c(2,0,1), seasonal = list(order = c(2,1,1), period = 12))

Model Fitting Visual Results and Residuals

Code
model_output <- capture.output(sarima(res.fit,2,0,1,2,1,1,12))

Code
res.fit |>
  Arima(order=c(2,0,1), seasonal = list(order = c(2,1,1), period = 12)) |>
  residuals() |> ggtsdisplay()

Code
checkresiduals(fit1)


    Ljung-Box test

data:  Residuals from ARIMA(2,0,1)(2,1,1)[12]
Q* = 36.154, df = 18, p-value = 0.006743

Model df: 6.   Total lags used: 24

The Ljung-Box test uses the following hypotheses:

H0: The residuals are independently distributed.

HA: The residuals are not independently distributed; they exhibit serial correlation.

Ideally, we would like to fail to reject the null hypothesis. That is, we would like to see the p-value of the test be greater than 0.05 because this means the residuals for our time series model are independent, which is often an assumption we make when creating a model.


There are two significant spikes in the ACF, and the model didn’t fail the Ljung-Box test.

Model Fitting

Code
res.fit |>
  Arima(order=c(2,0,1), seasonal = list(order=c(2,1,1), period=12))
Series: res.fit 
ARIMA(2,0,1)(2,1,1)[12] 

Coefficients:
        ar1      ar2      ma1    sar1    sar2     sma1
      1.392  -0.4079  -0.7879  0.0674  -0.072  -0.8284
s.e.  0.079   0.0732   0.0580  0.0506   0.049   0.0304

sigma^2 = 226.2:  log likelihood = -2432.01
AIC=4878.03   AICc=4878.22   BIC=4908.67

Model Output Diagnostics

Code
cat(model_output[65:100], model_output[length(model_output)], sep = "\n") 
$fit

Call:
arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
    xreg = constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
        REPORT = 1, reltol = tol))

Coefficients:
         ar1      ar2      ma1    sar1     sar2     sma1  constant
      1.3922  -0.4080  -0.7880  0.0674  -0.0721  -0.8284    0.0019
s.e.  0.0791   0.0733   0.0582  0.0506   0.0490   0.0304    0.1239

sigma^2 estimated as 223.9:  log likelihood = -2432.01,  aic = 4880.03

$degrees_of_freedom
[1] 581

$ttable
         Estimate     SE  t.value p.value
ar1        1.3922 0.0791  17.6042  0.0000
ar2       -0.4080 0.0733  -5.5656  0.0000
ma1       -0.7880 0.0582 -13.5489  0.0000
sar1       0.0674 0.0506   1.3319  0.1834
sar2      -0.0721 0.0490  -1.4715  0.1417
sma1      -0.8284 0.0304 -27.2535  0.0000
constant   0.0019 0.1239   0.0154  0.9877

$AIC
[1] 8.29937

$AICc
[1] 8.299698

$BIC
[1] 8.358917
Code
summary(fit1)
Series: res.fit 
ARIMA(2,0,1)(2,1,1)[12] 

Coefficients:
        ar1      ar2      ma1    sar1    sar2     sma1
      1.392  -0.4079  -0.7879  0.0674  -0.072  -0.8284
s.e.  0.079   0.0732   0.0580  0.0506   0.049   0.0304

sigma^2 = 226.2:  log likelihood = -2432.01
AIC=4878.03   AICc=4878.22   BIC=4908.67

Training set error measures:
                      ME     RMSE      MAE       MPE     MAPE      MASE
Training set 0.006210686 14.81186 11.34788 -11.06374 110.0339 0.6029853
                    ACF1
Training set -0.01093927

SARIMA(1,1,1)x(0,1,1)[12]

Code
fit2 <- Arima(res.fit , order=c(1,1,1), seasonal = list(order = c(0,1,1), period = 12))

Model Fitting Visual Results and Residuals

Code
model_output2 <- capture.output(sarima(res.fit,1,1,1,0,1,1,12))

Code
res.fit |>
  Arima(order=c(1,1,1), seasonal = list(order = c(0,1,1), period = 12)) |>
  residuals() |> ggtsdisplay()

Code
checkresiduals(fit2)


    Ljung-Box test

data:  Residuals from ARIMA(1,1,1)(0,1,1)[12]
Q* = 45.782, df = 21, p-value = 0.001366

Model df: 3.   Total lags used: 24

There is one significant spike in the ACF, and the model didn’t fail the Ljung-Box test.

Model Fitting

Code
res.fit |>
  Arima(order=c(1,1,1), seasonal = list(order = c(0,1,1), period = 12))
Series: res.fit 
ARIMA(1,1,1)(0,1,1)[12] 

Coefficients:
         ar1      ma1     sma1
      0.4485  -0.8254  -0.8340
s.e.  0.0637   0.0409   0.0232

sigma^2 = 229.8:  log likelihood = -2434.7
AIC=4877.4   AICc=4877.47   BIC=4894.9

Model Output Diagnostics

Code
cat(model_output2[30:61], model_output2[length(model_output2)], sep = "\n")
$fit

Call:
arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
    include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
        REPORT = 1, reltol = tol))

Coefficients:
         ar1      ma1     sma1
      0.4485  -0.8254  -0.8340
s.e.  0.0637   0.0409   0.0232

sigma^2 estimated as 228.6:  log likelihood = -2434.7,  aic = 4877.4

$degrees_of_freedom
[1] 584

$ttable
     Estimate     SE  t.value p.value
ar1    0.4485 0.0637   7.0433       0
ma1   -0.8254 0.0409 -20.2006       0
sma1  -0.8340 0.0232 -35.8717       0

$AIC
[1] 8.309036

$AICc
[1] 8.309106

$BIC
[1] 8.338849
Code
summary(fit2)
Series: res.fit 
ARIMA(1,1,1)(0,1,1)[12] 

Coefficients:
         ar1      ma1     sma1
      0.4485  -0.8254  -0.8340
s.e.  0.0637   0.0409   0.0232

sigma^2 = 229.8:  log likelihood = -2434.7
AIC=4877.4   AICc=4877.47   BIC=4894.9

Training set error measures:
                     ME     RMSE      MAE       MPE     MAPE      MASE
Training set -0.2983109 14.95631 11.52741 -17.96732 111.7469 0.6125249
                    ACF1
Training set -0.01297584


Model Selection

From two model diagnostics above, both SARIMA(1,1,1)(0,1,1)[12] and SARIMA(1,1,1) x (2,1,2)[12] model have similar number of spikes in the ACF and PACF plots of its residuals. All the training set error measures of the two models are similar. SARIMA(1,1,1)(0,1,1)[12] model has a slightly smaller sigma squared which means it has smaller variance. The estimators with a smaller variance is more efficient.

Cross Validation

Code
n=length(res.fit)
k=120
 
 #n-k=480; 480/12=40;
 
rmse1 <- matrix(NA, 40,12)
rmse2 <- matrix(NA,40,12)
rmse3 <- matrix(NA,40,12)

st <- tsp(res.fit)[1]+(k-1)/12 

for(i in 1:10)
{
  #xtrain <- window(a10, start=st+(i-k+1)/12, end=st+i/12)
  xtrain <- window(res.fit, end=st + i-1)
  xtest <- window(res.fit, start=st + (i-1) + 1/12, end=st + i)
  
  #ARIMA(1,1,2)x(0,1,0)[12]. auto.arima() - ARIMA(2,0,1)(2,1,1)[12]
  
  fit <- Arima(xtrain, order=c(2,0,1), seasonal=list(order=c(2,1,1), period=12),
                include.drift=TRUE, method="CSS")
  fcast <- forecast(fit, h=40)
  
  fit2 <- Arima(xtrain, order=c(1,1,1), seasonal=list(order=c(0,1,1), period=12),
                include.drift=TRUE, method="CSS")
  fcast2 <- forecast(fit2, h=40)
  
  fit3 <- Arima(xtrain, order=c(2,1,1), seasonal=list(order=c(2,1,1), period=12),
                include.drift=TRUE, method="CSS")
  fcast3 <- forecast(fit3, h=40)
  

  rmse1[i,1:length(xtest)]  <- sqrt((fcast$mean-xtest)^2)
  rmse2[i,1:length(xtest)] <- sqrt((fcast2$mean-xtest)^2)
  rmse3[i,1:length(xtest)] <- sqrt((fcast3$mean-xtest)^2)
}

plot(1:12, colMeans(rmse1,na.rm=TRUE), type="l", col=2, xlab="horizon", ylab="RMSE")
lines(1:12, colMeans(rmse2,na.rm=TRUE), type="l",col=3)
lines(1:12, colMeans(rmse3,na.rm=TRUE), type="l",col=4)
legend("topleft",legend=c("fit1","fit2","fit3"),col=2:4,lty=1)

This fit is good based on low RMSE: SARIMA(2,0,1)x(2,1,1)[12]

Model Fitting - SARIMA(2,0,1)x(2,1,1)[12]

Code
xreg <- cbind(RNWBL = dd[, "Renewable"],
              PTRLM = dd[, "Petroleum"],
              NTRLG = dd[, "Natural Gas"],
              CL = dd[, "Coal"])


fit <- Arima(dd$`CO2 Emissions`,order=c(2,0,1),seasonal = c(2,1,1),xreg=xreg,method="CSS")
summary(fit)
Series: dd$`CO2 Emissions` 
Regression with ARIMA(2,0,1)(2,1,1)[12] errors 

Coefficients:
         ar1      ar2      ma1     sar1     sar2     sma1    RNWBL   PTRLM
      1.5783  -0.5791  -0.8522  -0.0893  -0.1104  -0.6919  -0.0009  0.0733
s.e.  0.0731   0.0725   0.0513   0.0578   0.0509   0.0446   0.0164  0.0047
       NTRLG       CL
      0.1223  -0.0084
s.e.  0.0041   0.0069

sigma^2 = 83.51:  log likelihood = -2143.52

Training set error measures:
                     ME     RMSE      MAE         MPE     MAPE      MASE
Training set -0.3175613 8.969186 6.579235 -0.05707093 1.127818 0.2976303
                    ACF1
Training set -0.01008774

Forecast - SARIMA(2,0,1)x(2,1,1)[12]

In order to forecast CO2 Emissions variable, or the whole fit, we need to have forecasts of Renewable Energy, Petroleum, Natural gas, and Coal.

Here we will be using auto.arima() to fit the Renewable Energy, Petroleum, Natural gas, and Coal variables.

Fitting SARIMA model to different energy consumption variables

Code
RNWBL_fit <- auto.arima(dd$Renewable) #fiting an ARIMA model to the CO2 emissions variable
summary(RNWBL_fit)
Series: dd$Renewable 
ARIMA(0,1,2)(1,1,2)[12] 

Coefficients:
          ma1      ma2    sar1     sma1    sma2
      -0.2496  -0.1966  0.3548  -1.1398  0.2403
s.e.   0.0408   0.0414  0.5676   0.5770  0.4817

sigma^2 = 534.4:  log likelihood = -2681.06
AIC=5374.11   AICc=5374.26   BIC=5400.36

Training set error measures:
                    ME     RMSE      MAE        MPE     MAPE      MASE
Training set 0.2969799 22.76701 17.49176 -0.0738994 3.172679 0.4860948
                    ACF1
Training set 0.006186365
Code
fRNWBL <- forecast(RNWBL_fit, h=60)
Code
PTRLM_fit <- auto.arima(dd$Petroleum) #fiting an ARIMA model to the petroleum variable
summary(PTRLM_fit)
Series: dd$Petroleum 
ARIMA(3,0,2)(2,1,1)[12] 

Coefficients:
          ar1     ar2     ar3     ma1     ma2    sar1     sar2     sma1
      -0.0239  0.2107  0.6388  0.7031  0.4326  0.0213  -0.0886  -0.8043
s.e.   0.1010  0.0813  0.1060  0.1306  0.1478  0.0558   0.0511   0.0355

sigma^2 = 7153:  log likelihood = -3446.92
AIC=6911.83   AICc=6912.14   BIC=6951.22

Training set error measures:
                   ME     RMSE      MAE         MPE     MAPE      MASE
Training set 1.419845 83.15279 59.65203 -0.02073471 2.058544 0.5540171
                    ACF1
Training set -0.03071941
Code
fPTRLM <- forecast(PTRLM_fit, h=60)
Code
NTRLG_fit <- auto.arima(dd$`Natural Gas`) #fiting an ARIMA model to the petroleum variable
summary(NTRLG_fit)
Series: dd$`Natural Gas` 
ARIMA(2,1,1)(1,1,2)[12] 

Coefficients:
         ar1      ar2      ma1     sar1     sma1     sma2
      0.4262  -0.0133  -0.9085  -0.4149  -0.2781  -0.4480
s.e.  0.0475   0.0455   0.0236   0.1921   0.1801   0.1392

sigma^2 = 9115:  log likelihood = -3512.73
AIC=7039.46   AICc=7039.66   BIC=7070.09

Training set error measures:
                   ME     RMSE      MAE        MPE     MAPE      MASE
Training set 6.345775 93.94888 69.62737 0.09510603 3.560609 0.6725354
                     ACF1
Training set -0.005717725
Code
fNTRLG <- forecast(NTRLG_fit, h=60)
Code
CL_fit <- auto.arima(dd$Coal) #fiting an ARIMA model to the petroleum variable
summary(CL_fit)
Series: dd$Coal 
ARIMA(2,1,2)(2,1,1)[12] 

Coefficients:
          ar1     ar2      ma1      ma2    sar1     sar2     sma1
      -0.1566  0.4381  -0.1342  -0.6282  0.0155  -0.1504  -0.7816
s.e.   0.5999  0.2635   0.5924   0.4288  0.0520   0.0481   0.0331

sigma^2 = 2982:  log likelihood = -3184.71
AIC=6385.42   AICc=6385.67   BIC=6420.43

Training set error measures:
                    ME     RMSE      MAE        MPE     MAPE      MASE
Training set -1.378761 53.69276 39.66915 -0.1088342 2.908254 0.5045871
                     ACF1
Training set -0.001052996
Code
fCL <- forecast(CL_fit,  h=60)
Code
fxreg <- cbind(RNWBL = fRNWBL$mean,
               PTRLM = fPTRLM$mean,
               NTRLG = fNTRLG$mean,
               CL = fCL$mean)

fcast <- forecast(fit, xreg=fxreg) #fimp$mean gives the forecasted values
g <- autoplot(fcast) + 
  xlab("Year") +
  ylab("Million Metric Tons of Carbon Dioxide") +
  ggtitle("CO2 Emissions Forecast from SARIMAX Model")
ggplotly(g)